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Patent Specification 

1. Title of Invention 

Speech discrimination and detection Apparatus 

2. Claims 

A speech discrimination and detection apparatus comprising: 

a conversion circuit for converting an inputted signal contained in each of 
extraction periods of a pre-assigned length into line spectrum pair coefficients; 

a coefficient distance determining circuit for determining whether the distance 
between a pair of the mutually adjacent line spectrum pair coefficients is longer than a 
pre-assigned threshold distance; and 

a voiced sound discriminating circuit for discriminating a speech sound by judging 
whether the determination results of the coefficient distance determining circuit remained 
the same for more than a pre-determined length of time. 

3. Detailed Description of the Invention 

[Field of the industrial application of the Invention] 

The invention is related with an apparatus for discriminating an inputted speech sound 
and in particular with a speech discrimination and detection apparatus incorporated into an 
apparatus such as a speech recognition apparatus and discriminating portions containing 
parts of a speech, in association with an inputted sound. 

[Description of the Prior Art] 

A speech determination and selection apparatus for discriminating and detecting 
speech portions, according to a prior art, is as shown in Fig.4 and comprises: 

a circuit 8 for extracting from the inputted signal 1 which corresponds to an 
inputted sound, its level. 

a threshold setting circuit 10 for. pre-assigning the threshold value as a value 
determined in a manner related to parameters such as the noise level and/or the speech 
level at a time of the sound being inputted to and issuing determination signals 11 notifying 
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starting and ending points of a speech portion by determining a relevant point as the 
starting point of a speech portion by comparing the inputted level signal 9 made available 
by the circuit 8 with a threshold value and when measuring the inputted level signal 9 to 
stay continuously higher than the threshold value for more than a pre-assigned time, and 
determining a relevant point as the end point of the speech portion by comparing the 
inputted level signal 9 with the threshold value and when measuring the inputted level 
signal 9 to stay continuously lower than the threshold value for more than another 
pre-assigned time, and 

a speech portion detecting circuit 12 for issuing speech detection result indicating 
signals 7 in response to a receipt of the determination signals 11 notifying starting and 
ending points of the speech portion. 

Speech portions of an input signal are determined as portions each of which is defined 
as a period between a pair of starting and ending points determined as above according to 
a prior art technology. 

[Problems to be solved by the Invention] 

The above explained prior art speech discrimination and detection apparatus relies on 
information associated with the power level of an inputted sound and as a result of this it is 
prone to collecting environmental noise and associated with a problem, difficulty in 
separating between the inputted sound and the environmental noise.. 

[Means for solving the problem] 

A speech discrimination and detection apparatus of the present invention comprises: 
a conversion circuit for converting an inputted signal contained in each of 

extraction periods of a pre-assigned length into line spectrum pair coefficients; 

a coefficient distance determining circuit for determining whether the distance 

between a pair of the mutually adjacent line spectrum pair coefficients is longer than a 

pre-assigned threshold distance; and 

a voiced sound discriminating circuit for discriminating a speech sound by judging 

whether the determination results of the coefficient distance determining circuit remained 

the same for more than a pre-determined length of time. 

[Operation] 

According to the present invention, therefore, it becomes possible to discriminate 
speech portions from an input signal, even under presence of environmental noise, 
because the present invention is configured so that distances between line spectrum pair 



2 



JP63-262693 



coefficients are analyzed for determining speech portions of the input signal, 

[Embodiment of the Invention] 

Below is an explanation about an embodiment of the present invention and is given 
along related drawings. 

Fig.1 is a block diagram associated with an embodiment of the present invention and 
Figs.2 and 3 respectively are charts for explaining the embodiment in Fig.1. An inputted 
signal 1 corresponding to an inputted sound is expected in general to contain 
environmental noise. The line spectrum pair converting circuit 2 converts an inputted signal 
1 into a line spectrum pair (herein after called LSP) coefficient signal 3 in accordance to the 
LSP method which belongs to a kind of the linear prediction coding method. Here, the LSP 
coefficients are parameters of a frequency domain. 

For instance, calculating in 8th-order of analysis order, 8 parameters, w 1f w 2l w 3 ,. . . , 
w 8 are obtained as shown in Figs.2 and 3. Here, the analysis is made for the sampling 
frequency of 8kHz, at the frequency band of 0.4 - 3.4 kHz, the same frequency band as 
that of the telephone, and for the analysis frame period between 10-20 msec. 

Further detail on the LSP may be found in an article titled "Speech synthesis method 
using line spectrum frequencies as parameters and speech synthesis LSI", Nikkei 
Electronics, p.p. 128 - 158 Issue No. 257, 02 Feb. 1981. 

LSP coefficients, Wi - w P are parameters of a frequency domain and they tend to 
gather in the vicinities of formant frequencies Fi - F P/2 associated with a voiced sound. 
Here, the following relationship holds among the LSP coefficients. 
0 < W! < w 2 < . . . < w p _ 1 < w p < □. 
with P being the analysis order. 

The coefficient distance determining circuit 4 calculates distances between adjacent 
line spectrum coefficient pairs, (w 2 - w0 . . . (w 7 - w 8 ), based on a line spectrum pair 
coefficient signal 3, as shown in Fig.2. 

Below is an example of calculation methods for the coefficient distances, (w n - w^). 

Assuming the LSP analysis is performed in a P-th order of the analysis order, following 
formulas are calculated for cases in which n = 2, 3, . . P, respectively. 
(w n -w n _ 1 )<w T Hi (1) 

(Wn-Wn_i) < W TH 1 (2) 

Here, w T hi and w T H2 are threshold values ^associated with the LSP coefficient 
distances (w n - w^), respectively and hold the relation, w T m < w T H2. 

When determined that there is one or more LSP coefficient w1 wP satisfying 

formula (1) and in addition, there are two or more LSP coefficients w 1t . . ., w P satisfying 
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formula (2), the coefficient distance determination result signal 5 is judged to be of a voiced 
sound and the voiced sound determining circuit 6 issues a speech determination result 
signal 7 after receiving, for 3 consecutive frames, the coefficient distance determination 
result signal 5 of indicating the receipt of a voiced sound. 

Fig.2 shows the relation between the frequency spectrums and the LSP coefficients 
corresponding to a situation in which a voiced sound is inputted in the embodiment case 
shown in Fig.1. Fig.3 shows the relation between the frequency spectrums and the LSP 
coefficients corresponding to a situation in which an unvoiced sound or an environmental 
noise is inputted in the same embodiment case shown in Fig.1 . 

As it becomes clear from Fig.2, when the inputted sound is a voiced sound, the LSP 

coefficients W! w 8 are positioned in the vicinities of formant frequencies, F 4 . In 

particular, for the first formant frequency, Fi, as the resonance gain is larger in general, LSP 
coefficients Wt and w 2 are positioned more closely to the formant frequency, and hence, the 
LSP coefficient distance (w 2 - w^ becomes shorter than the threshold w TH i and the LSP 
coefficient distance (w 4 - w 3 ) near the second formant frequency becomes shorter than the 
threshold w™. 

In contrast, when the inputted sound is an unvoiced sound or an environmental noise, 
the associated frequency spectrums are very flat as shown in Fig.3 and the LSP 
coefficients w 1t . . ., w 8 are not positioned very close to those spectrums. As a result of this, 
the LSP coefficient distances, (w n - w^) do not become shorter than either of w TH i and 

WTH2- 

[Effect of the invention] 

As described above, the present invention is concerned with a configuration in which 
an inputted signal is determined whether it is of a speech signal by a method of analyzing 
line spectrum pair distances in place of relying on the method, comparing the levels of an 
inputted signal so that it becomes possible to extract a speech signal from an inputted 
signal even if the inputted signal is composed of a mixture of speech and environmental 
noise sounds, provided that the inputted signal is a voiced sound. Incorporating the 
invention apparatus into a speech recognition apparatus is effective in improving its speech 
recognition success rate. 

4. Brief explanation of drawings 

Fig.1 is a block diagram showing an embodiment of the present invention. Fig.2 and 
Fig.3 are charts associated with the embodiment shown in Fig.1. Fig.4 is a block diagram 
showing an embodiment of a prior art technology. 
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2: line spectrum pair converting circuit 4: coefficient distance determining circuit 
6: voiced sound determining circuit 
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